Development of the estonian speechdat-like database

نویسندگان

  • Einar Meister
  • Jürgen Lasn
  • Lya Meister
چکیده

A new database project has been launched in Estonia last year. It aims the collection of telephone speech from a large number of speakers for speech and speaker recognition purposes. Up to 2000 speakers are expected to participate in recordings. SpeechDat databases, especially Finnish SpeechDat, have been chosen as a prototype for the Estonian database. It means that principles of corpus design, file formats, recording and labelling methods implemented by the SpeechDat consortium will be followed as closely as possible. The paper is a progress report of the project.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Basque Speecon-like and Basque SpeechDat MDB-600: speech databases for the development of ASR technology for Basque

This paper introduces two databases specifically designed for the development of ASR technology for the Basque language: the Basque Speecon-like database and the Basque SpeechDat MDB-600 database. The former was recorded in an office environment according to the Speecon specifications, whereas the later was recorded through mobile telephones according to the SpeechDat specifications. Both datab...

متن کامل

Towards Large Databases for Music Information Retrieval Systems Development and Evaluation

In the context of MIR/MDL evaluation, a key component for evaluation would be the availability to the research community of a large corpus of test data consisting of both audio and structured music data. This paper proposes a possible path towards this goal by following the basic principles of the SpeechDat projects. SpeechDat refers to successive EC supported projects of large scale multilingu...

متن کامل

SpeechDat Cymru: A Large-scale Welsh Telephony Database

We describe the collection of SpeechDat Cymru, a 2000-speaker speech recognition database for the Welsh language, recorded over the public switched telephone network (PSTN). It is collected as part of SpeechDat(II), an ELRA project which deals with the creation of databases in over 20 different European languages and dialects. Design issues common to all SpeechDat(II) databases are discussed, i...

متن کامل

The Development and Integration of the LDA-Toolkit Into COST249 SpeechDat(II) SIG Reference Recognizer

This paper presents the development of Linear Discriminant Analysis toolkit (LDA-Toolkit) and its integration into widely used COST249 SpeechDat(II) Task Force Reference Recognizer (RefRec). The crucial parts of the LDA, the determination of LDA classes, as well as the influence of the level of dimensionality reduction on automatic speech recognition performance, are discussed. Evaluation of pr...

متن کامل

SpeechDat(E) - Eastern European Telephone Speech Databases

This paper describes the creation of five new telephony speech databases for Central and Eastern European languages within the SpeechDat(E) project. The 5 languages concerned are Czech, Polish, Slovak, Hungarian, and Russian. The databases follow SpeechDat-II specifications with some language specific adaptation. The present paper describes the differences between SpeechDat(E) and earlier Speec...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003